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ABSTRACT 

This paper reviews the methods and findings of 
published research on the validity of police selection procedfires. As 
a preface to the review, the typical police officer selection process 
is briefly described. Several common methodological deficiencies of 
the validation research are identified and djscussed in detail: (1) 
use of past-selection research designs; (2) inappropriate comparison 
groups; (3) non-meaningful outcome variables; (4) alpha-inflated 
analysis; (5) over-emphasis of beta weights; and (6) the search for 
moderator variables. Validity evidence for several types of selection 
variables is discussed including biodata, measures of intellect, 
personality measures, interviews, interest inventories, and 
subjective background ratings. Of the 14 biodata categories 
researched only $ were validatiled as predictors of poor police 
performance (prior involuntary termination, criminal and vehicle code 
convictions, having been married more than once, and short duration 
of prior jobs). Measures of intellect, subjective background ratings, 
and personallity measures provided mixed evidence of validity. Some 
scales of th& Minnesota Multiphasic Personality Inventory were found 
to have post-Selection validity in more than one study. There was no 
meaniggful evidence of validity for interest inventories or 
interviews as police selection procedures. (MCF) 
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Abstract 
The methods and findings of sinh eaninionelan validation studies were 
reviewed, Several common methodological deficiencies were identified, 
involving (a) preditor data collected after the:officers were hired, (b) 


ey 


inappropriate comparison groups, (c) contaminated outcome-variables, (e) 
non-meaningful outcome variables, (f) piouwodntiated analyses, (g) 
overemphasis of beta weights, and (i) the search .for moderator 
variables. The validity evidence for several types of selection 
variables, including biodata, measures of intellect, measures of 


personality, interviews, interest inventories, and subjective background 


ratings was found to range from scant to promising. 
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In most jobs the costs of making a mistake in hiring involve low i, 


productivity, theft and early tarnover. With police officers, however, 
the costs can reach monumental levels. For when an ‘officer makes a 
mistake, someone may die, when an officer acts oppressively, trust in 
government may erode, and when an officer quits or is fired, $10,000 or 
more in training costs (Mills & Stratton, 1982) may be lost. Hence, the 
selection of police officers is or should be a central concern of all of 
the thousands of police agencies across. the sation, facladivewinteipal 
police forces, sheriffs' offices, state patrols, military police and 
various federal agencies such as ths Federal Bureau of ‘nvestigation. 
Because of the importance of hiring good police officers, 1-3 | 


researchers have given considerable attention to the matter. The first 


reported study of the validity of a selection method was by Martin in 
1923, and since then more than 40 reports of studies have been 
published. | 

Brief reviews of the earlier studies can be found in Cohen & 
Chaiken (1972), Ghiselli (1966, 1973), Kent & Eisenberg (1972), a. 
Lefkowitz (1977), Poland (1978), and Speilberger, Ward & Spalding 
(1979). Also, Henderson (1979) reviewed the validity of personality and 
atitode scales, and Sparling (1975) ceviaued the validity of education a 
as a selection criterion. 
. However, many research design, statistical and report problems with 
the research Were not discussed in any of the reviews (e.g., use of 
inappropriate comparison groups). Further, none of the reviews 


contained-an in-depth examination of the different types of outcome and 
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predictor varjables that have been used in the research. Finally, the 
savinue até not include any of over a dozen validation studies reported 
in the past few years, Sadly, many of the recent atudios are replete 
with the same methodological errors made in prior research, 

The purpose of this review is to provide a careful examination of 
the methods that have been used in research on police-selection and an 
up-to-date review of the resudts of research on all the types of 
psychological selection variables examined in the studies, 

This review includes only published reports. Unpublished reports 
cate excluded because they are difficult or impossible to obtain and 


because they have not been subjected to the scriitiny of the scientific 


community and so may contain methodological deficiencies that escape any 


‘single reviewer (see e.g., the scathing but different reviews of the 


Humm & Humm (1950) study by Blum (1964, pp. 106-107) and Ruch (1965)). ° 
. As a preface to the review, an outline of the typical police 


officer ‘selection process is inorder. Abramson (1974), in a survey of 


.266 college or university police departments, found that the vast 


~ 


majority required that applicants be at least 21)’ years old, be a high 


‘schoo] graduate, have no criminal record, and pass a background 


investigation of their morals and personal character. Almost every 

department used an interview with detective oar administrators as a 

screening device. A significant ‘number of departments also used a test 
9 ‘ ? 


measure of intellect, a clinical interview by a 


psychologist/psychiatrist and/or ‘4 psychological test. Parish, Rios & - 


_ Reilly (1979) surveyed 130 police departments and found that .the 


oy 
oli 


% 
Minnesota Multiphasic Personality Inventory (MMP1) waa the moat used 


suvchelogtcal, Ghat, bolas used by“43% of the departments.’ The aext moat 
popular teats were thee California Personality Iaventory, the Rdwarda 
Personal Preference Schedule and the 16 Personality Factor, in that 
order, None of these other testa were used by more than 16%-of the 


departments, 


Research Methods 


Poat-Selection Research 


The most typical research design used in police-selection 
validation studies involves determining the correlation between 
selection variables and outcome vaciabbens Three types of ebeewlaclonsl e 
research studies have been done, Gne type, which will be called 


pre-selection research, involves the collection of the selection data 


prior to the hiring of the subjects (e.g., the study of Marsh, 1962). A“ 


second type, which will’ be called post-selection predictive, uses data 
collected after the subjects were hired but before the collection of the 
outcome data (e.g., the atudy of Speilberger, Spalding, Jolley & Ward, 
1979), The third type, which will be termed post-selection ‘concurrent, 
uses selection data collected after the subjects were hired and at about 
the same time as the collection of the outcome data (e.g., the study | 
Cascio, 1977). ‘This third research design is the most common one in 
police-selection studies. . 


Both post-selection designs create major problems in interpreting 


results because of possible differences between actual applicants and 


~ ; \ 
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the officere who purticipete in the post-selection research, Firet, 


there ia @ likelihood of different levele of faking good. The 
motivation Canipeieaitti cours favorably is far less in the case of 
individuals who have already been hired (Rothe, bith t Several studies. 
of wavluin jobs foumd that persons in screening atiuatiows tend to 
produce substantially more socially desirable responses to psychological 
asnesement (Elliot, 1976; Michaelis & Eysenck, 1971; Kirchner, 1962; 
Bass, 1957; Heron, 1956; Herzberg, 1954; aod Green, 1951), including 
biodata requests (Schrader & Osburn, 1977). Hence, variables that 
' correlate with outcome measures in post-selection research may not do 60 
with subjects who have wore reason to fake good. This risk, of course, 
does not apply to selection variables that cannot be faked (e.g., 
measures of maximum performance; (Barrett, Phillipe & Alexander, 1981), 
and it may apply only to a limited degree to variables that are rarély 
faked because of the chance of being caught (e.g., self-report of the 
highest educational degree earned.). ; . 

The second difference between pre-selection and post~selection 
. studies involves the levgl of srousel. Officers already hired may be 
‘leas aroused when completing « performance test such os an intelligence 
teat, Differences in “arousal can produce differences in performance, 
either ecaiite a decreasing it, depending on the individual and his 
optimum level of arcusal (Hebb, 1949). Hence, performance tests that 
correlate with outcome variables io post-selection research may not do 
ao when applied to applicants because with them arousal differences mask 


e 


differences in ability ~~ which covld be the crucial part of the 


w 
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performance variable On the other hand, performance testa found not to 
correlate with outcome variables in post+selection. renearch might 
actually be useful ta selection because they measure important 

4 _ differences ii arousal control. They might oniy term invalid because 


there i8 no arousaj-control, problem in post-selection subjects. 
® 


é ' 
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The tinal difference between pre-selectian aid post-selection | 
studies savetins differences ta job experience. It i¢ possible that 
/ 
personality types that are concurrently associated with good performance 
in veteran officers may, be associated with subs¢quent poor performance 


in applicants. For example, officers who gregood at their work may 


~~ s  OS  e 


cd 


develop @ personality trait of high ree However, applicants 
who sre high in self-appraisal may turn out to be officers who take 
’ needlesa risks. 

Because of these difficulties im interpretation, post-selection \ 
research findings should be considered no more than suggestive with 
regard to the validity of their use in shin anche of employers 

, (Anastasi, 1976; Cascio, 1978; Dunnette, 1966; Guion, 1976; Siegal & 
Lane, 1974 and Zednick & Blood, 1974). ’ Pre-selection research is 


clearly preferable. * 


-+ 


Inappropriate Comparison Groups 

‘ ' feveral police-selection studies examined differences between 
a who had resigned and officers wis ceantnes (Cross & Hammond, 

1951; Finnigan, 1976; ard Thweatt, 1972) in an attempt to establish what 


Anastasi (1976) called "validity through contrasted groups.” Of the fe 


studies, three, had serious confounds not mentioned in the research 
reports. 
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Finnigan (1976) attempted to determine whether « college education 
leads to better pecformance by police officers. The study compared a 


group of officers who had a college degree and « group of efficers with 
only a bigh school diploma. However, in addition to differing as to 


re education, the two groups differed in severel other important reapects. 
The college~educated officers were hired separately, were referred to in 
the police agency aa “agenta’ and hed to have minioum of ja aunt of 
field duty. before qualifying to be hired. Further, although the report ; : 
does not say so, it seems likely that the “agente” were paid more, Any 
. of these other diffezences could account for differguces between the. two 
aroupse in outcome variables. 
Thweatt (1972) cianaew ® group of 50 experienced officers, 
sergeant aad below, and a group of sabjects who had resigned withio 16 
mooths of being hired.’ Apparently the groups were different regarding 
year when hired. This difference canted account for differences between 
groups on selection veriables. 
Finally, Cross & Hansood (1951) compared current officers who had 
been employed “for at least one year” and officers who had resigned or 


‘ : been dismissed within the past three years. As io Thweatt's study, 
‘ group dittevsiiee in date when hired may have been «a serious confound. 3 
Outcome Variable Contamination Pr 
: Outcome-variable coftamination occurs when a predictor variable : 
artifically iofluences an sitecua measure. For exaeple, one of the 
outcome variables in Finnigan's (1976) study was supervisor ratings. It. 
a seems likely that the knowledge by supervisors that one group was more 
' Malouff/3/A-8 9 . 
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highly educated led by itself to sume differences in ratings. This — 
of confound can occur in any pre-selection\study that uses supervisor 
evaluations as outcome measures if the tested selection information 2% 
available to supervisors (McDonough & Honahan, 1975). The reperts of 
west police-selection studies made no mention of thas possible problem, 
Uniarful and Unfair Selection Variables 

Some police valiaity studies examined selection varsables which 
could not lawfully or ethically be used. A prime «ample is race (e.g-, 
Spencer & Nichols, 1971), which cannot Jawfully be used in selection 
decisibns by any ae agency (42 U.S.C See 2000). 

It may also be unlawful to use other biodata such as place of birth 
(e.g. Cohen & Chaiken, 1972) and length of residence (e.g. Levy, 1967) 
in the area of the polier agency. Use of these biodata by police 
agencies would appear td violate the constitutional right to interstate 
travel See Shapiro v. Thompson (1969), which held that residency 
doration requirements for receiving free medical care violate this 
right 

Another inappropriate variable is number of children, examined as a 
selection variable by Cohen & Chaiken, (1972). Use of this item by. 
police agencies would appear to violate the constitutional rights of 
privacy and freedom of action with regard to procreation. See 
Griswold v. Connecticut (1965), in which the U.S. Supreme Court held 


that couples have a constitutional right of privacy which allows them 


alone to decide whether to use contraceptives. 
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“gee Other questionable items are father's occupation, number of ‘ 
siblings, number of family wepbers with a mental disorder (Cohen & 
Chaiken, 1972), birth order, adaber of parental residences (Leiren, 
1973), (Levy, 1967; Speilberger, Spalding, Jolley, & Ward, 1979), and 
whether the applicant's father was home duricg the applicant's childhood 
{McDonough & Monahan, 1975). The use of some of these variables in 
selection would likely have the effect of screening out disproportionate 
numbers of “minority applicants, who might tend to score low on these 
items, Moreover, the use of the items would be out of keeping with the 
American ideal of judging individuals by what they are or have done, 
rather than something that a parent did. , 
Meaningful and Non-Meaningful Outcome Variables . 

Conceptually inappropriate variables. Validation studies are ¥ 


= necessarily no better than their outcome variables. It simply does not 
matter whether a selection variable correlates with a meaningless or : 
completely ambiguous outcome variable. 

Many different outcome variables have been used fa police-selection 

research. Among those that are at least somewhat meaningful are the 
following: (a) involuntary termination (e.g. Levy, 1967); (b) citizen 
complaints (e.g., Cohén & Chaiken, 1972); (c) disciplinary accusations 
or actions (e.g., Cascio, 1977); (4) vorkers Compensation cleins/tines a 


Poe eee ee i 


injured/days sick (e.g., McAlister, 1970; Snibbe, Azen, Montgomery & | as 


tay 


Se ee eee 


Marsh, 1973); (e) sick leave abuse (e.g., Spencer & Nichols, 1971); (f) 
preventable auto accidents (e.g., Fabricatore, Azen, Schoentigen & om 


Snibbe, 1978); (g) resignation within two or three years (e.g., Azen, ¢ 
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Saibbe, Montgomery, Fabricatore, & Earle, 1974); (bh) department , 


scuctehinimendbions (e. Bee Beebr & Froemel, 1977); (i) letters of 3 
commendation from the eutte to.gsy Blum, 1964, study 4); (j) promctions 
(e.g., McDonough & Monaran, apc (k) supervisor ratings (e. Rs Bartol, 
1982); and (1) peer ratings na -B-, Henderson, 1979). 5 - . ; 
Several variables used for outcome measures are conceptually 
laapoconciates. First, there is selection for hiring (e.g-, Saxe & 
Reiser, 1976; Hogan & Kurtines, 1975; Spencer & Nichols, 1971). In some 
sense, selection for hiring can be viewed as a type of concurrent 
validity outcome measure. For instance, new depression scales are often 1 
evaluated against the MMPI depression scale. However, this type of 
validity check presupposes that the criterion measure has itself been ‘ 9 
shown to have eeedletive validity, that is, to be associated with 
perforwance measures. In the case of selection for hiring, there is no 
reason to believe that the selection is valid. Hence, selection for 
‘ hiring is a ueundagiens outeune measure. ; “4 
Second, there is training acadeay performance (e.g., Lester, 1979; 
Gordon & Kleinman, 1976;, Hogan & ‘sien 1975; Flyan & Peterson, 1972; | 
Hogan, 1971; McAllister, 1970; Mormon, Hankey, Heywood & Liddle, 1966; a ; 
Mortoon, Henkey, Kennedy & Jones, 1966; Mills, McDevitt & Tomkin, 1966; | 
Mullineaux, 1955; and Dubois & Watson, 1950). Qn the one hand, it is 
important that selected police candidates’ survive training io order to 
minimize training costs. On the other handy training performance ‘is not 


job performance, and to the extent that the two are unrelated, (a) it is 


unimportant whether a “recruit finished first. or last in his police oe 
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' acedeny class, and (bh) it ‘7 potatless to fail kyon, 80 ‘prediction of 
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“termination airing ‘training ‘is also uoinportent. ‘The question is, then, « . ; : 


4 : i 


of what is. the relationship secween training performance and job r pig hs 


$ ‘performance’ four relevant studies have been reported. ct a i aa 7 
: | “Cohen & Chaiken (1972) found that high acadeay grades were ‘i + ; ~ 
positively associated with many avards, promotion through civil service ” : a * a : 3 4 

- xan few compleiite, few trials, few substantiated complaints against an Ee ete : 

- officer, and few times sick. “The study. Eoiitid no association with eight | - ae 
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he 


pote outcome variables, _ Mit Sg 
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“Gottlieb, & Baker (1974), found an aynoctation between scadeny score: Pay : : E 


' ” 


. and pottovaise-boned supervisor qatings, : 7 ees i 


“Azea, Saibbe, Montgoery, Fabricatore & Earle. asm) found Ro OF fds : : ; 
‘significant association between scedeny variables of @) academic, ee ¥ , 7 7s 
"performance, (b) physical performance, “(e) ‘marksmanship, and (4). : as ‘ ' 
“instructor. ratings and outcome variables of enigaation withio two. years i; ; : 

a and supervisor ratings. ee ‘ oe <! e k P 
ee Leiren (1973) found go /eldtionship between ecadeny ekg: "aad Le ot ; om 
- pee absenteeism, secidents supag¢visor zatings, or Commendations. ~ * “A 

; ~ Hence, the. evidence ‘of an oi bptheen:teatning Perscemants.: ye He 

. * aad job artovnancel ix mixed, It may be that police departments vary. . . “ 
* * widely in their: training and evaluation of training performance, a, . : 4; Bah 4 
ea therefore seems, unwise to senume ‘in. any particular agency or: study that” eee ‘4 

: eet tee acadeny variabler predict job performance. ‘Thus training = | i Mi : “ 

: | performance | is generally not meer ag a performance variable. | | = = 
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subculture that eaphanizes alienation,.cynicism, secrecy, isolation, and 


“ few years is nbdt a valuable outcome measure. 
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Third, there is the variable of no resignation ever (Levy, 1967 & 


1973; Marsh, 1962; Snibbe et a1.°1973; Blum, 1964, studies 3&4). It 


i 
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is reasonable for police agencies to want new police officers to stay . 
with the department for at least a few years. The high cost of training 


makes rapid turnover through resignation a major problem. Therefore, an 
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outcome measure of whether recruits stay at least two or three years is 
a meaningful one. However, it is doubtful whether longer tenure is / 


meaningful. Burkhardt (1980) suggested that long tenure may be a 


. 


negative outcome in that only the worst officers iene because (a): “they o 


cannot find better jobs, and (b) they fit into an existing polite 


i , 


* 


non-motivation. Although thaxe is no direct empirical support for 9 


Burkhardt's "first point, it has some intuitive appeal: Further, there 


e 


‘is evidence that the subculture Burkhardt describes is a common one fn 


police departments, and that it does tend to pressure recruits into 


. 


becoming part of/it or leaviimge-the departmegt (Van Maanen, 1975). 


, 
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Even aside” from Burkhardt's argument, it is unclear whether long 


tenure is a favorable outcome. Although training costs are saved and 
experience in the department may be helpful, some turnover aay be good 
in that it helps to eliminate officers who are experiencing burn out and ee : 
to add officers with new ideas and: recent training.-- --- °- | « Ai 
The Seat view seems to be that turnover within 2 or 3 years is so 
costly that it is a valid out Come variable, but later turnover cannot be 


~ 


clearly interpreted as either good or bad. Hence, resignation after a 
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*yourty, there is the number of years as a police officer (Baehr & 
Froenel, 1977; Marsh, 1962). This variable has little to ceecumend it 
as an outcome measure. In addition to the problems pent toned gbows 
regarding rekignation, it has virtually no meaning because it primarily 
measures the year in which officers eave ined: 

Fifth, there id the absolute number of positive or negative 
iocidente a6 accomplishments with wide differences, im. opportunity (e.g., 
Spencer &-Nichols, 1971). Outcome measures are meaningful only if 
opportunity is either equal for all subjects or is controlled for’ 


statistically. When the matter of opportunity is ignored and 


substantial differences exist, error variance occurs in the outcome 


variables, making it difficult to determine the true validity of Z ‘ 


selection variables. | | a 

In some canes the difference in opportunity can create a major . 
confound.’ This may have occurred in the study of Baehbr, Saunders, 
Froemel, & Furcon (1973), who, found that black officers find itiee as 
many arrests as white officers. One might speculate that blacks were 
better officers or that the black officers were assigned to minogity 
neighborhoods with high crime rates. hasustag abe latter, the 
researchers could havé statistically controlled for the difference in 
opportunity by using as an outcome measure the number of agepite by an 
officer divided by the total number of arrests in the officer's precinct 
during the same time period. 


Another likely victim of this type of error variance is any study 


using number of auto accidents as as outcome variable (e.g., Leiren, 


é 
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1973). If all the subjects drove roughly an equa! amount, there would 
‘ : . 
‘ be:no problem. If, however, some drove much more than others, it makes 7 


little sense to use number of auto seuiaente ae an outcome measure. 
: The question of opportunity is one that should be considered 
_whenever an outcome variable is used that svelune number of incidents 
or_accomp]ishments. Researchers should not assume equality of 
opportunity even with regard to variables such as number of 
commendations. If as a practical matter commendations are given uy to 
officers who go on patrol, there may be substantially less opportunity 
for officers who serve ‘as administrators, guards or in other non-patrol 
positions, * 
ta some cases it is ‘very difficult or impossible to control for # 
opportunity. For exanple, if the subjects a aa move fron one 
precinct and .job to another, determining opportunity can become a Per 
\ nightmare. In these cases, researchers should at least point out the ~~ 
‘ differences in opportunity so others can better interpret the findings. ' 
Sixth, thera ie the number of accusations of wrongdoing by. officers 


' : t 
* jw J H 
determined to, be unfounded (Cascio & Real, 1977; Spencer & Nichols, 


1971). This variable makes no sense as an outcome variable, _ Unfounded 


Oo eee. eee ae 


accusations are¢ neither good nox’ bad indicators of performance. 


, re Lack of coverage b: outcome variables. In addition to jthe sani a 


of inappropriate outcome variables, thefe is a problem of too few 

. : { ‘ 
outcome measures relating to performance other than arvegt7type actions. 
Job analyses done on the work of police officers have shown that they 


tend to spend about 85%-90% of their time’ doing work oo to crime, 
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including mediating family disputes and writing accident reports 
(Lefkowitz, 1977). None of the studies surveyed included any direct 
measure of these variables. Although supervisor ratings: could be an 
indirect measure, it may be doubted whether supervisors either have 


informaton on the execution of these tasks or particularly care. Police 


supervisors may tend to pay more ‘attention to crime-fighting aspects of | 


performance (Burkhardt, 1980). 

Some steps have been made in the direction of assessing performance 
unrelated to crime-fighting. Carr, Larson, Schnelle & Kirchner (1980) 
reported efforts to assess different types of police performance such as 
report writing and testifying. .They used davetully Sohetuieted rating | 


forms: As yet, there appears to be no reliability or validity data on 


approaches of this sort. ; 
Low-Variance Problems With Outcome Measures. Rosen & Meehl (1955) 
noted the statistically based difficulty of predicting infrequeat 
.events. The same problen applies to outcome variables with 
low-variance. . ' 
_ ° Outcome variables with inadequate variance cannot always be 
identified in research reports because théir variance foten is not 
reported. A likely candidate for inadequate variance in many studies is 
departaental ratings, because they,tend to have such a leniency 
(ceiling) effect that there i little or no difference among subjects. 


Variables which surely had inadequate variance include whether an 


officer has been charged with a crime while employed (Cohen & Chaiken, 
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1972 -= 8% were charged) and being fired within a few years of being 
hired (McAllister, 1970 -- 5 were fired compared to 60 retained). 


Researchers would be well advised to determine whether selection 


. and outcome variables being considered for a study have sufficient 


variance to warrant the time and effort needed to collect and analyze 
the necessary data. Unfortunately, low-variance cannot always be 
predicted in advance. When it eftettively scuttles an outcons variable, 
research reports should note this so that the selection variables tested 
will not be mistakenly written off by SEERA 

Rating Problems. Ratings are the only type of outcome measure that 
have received significant attention with regard to their usefulness. . 


Traditionally, the only type of rating used as an outcome measure was an 


absolute supervisor rating on a characteristic or job behavior. This is 


still a common outcome measure in police-selection studies. Several 
problems exist with these ratings. First, there are the leniency and 
low-variance problems mentioned above. Second, there is doubt about 
their meaningfulness. Third, ratings on different dimensions are very 
highly correlated, suggesting a hallo effect that makes use of the 
dimension-scores dubious. : 

Recent police studies by Cascio & Vanlezi (1978) and Baehr & 
Froemel (1977) found few or no associations between supervisor ratings: 
and objective measures of performance, suggeting that the ratings may 
not be meaningful performance measures. Similar findings have been 
reported with non-police employes (e.g., Hausman & Strupp, 1955; and 


Seashore, Indik & Georgeopoulos, 1960). 


’ 
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,Many attempts have been made either in police hese eek or other 
selection research to overcome these problems. These attempts involve 
using different, persons doing the ratings and different kinds of rating 
formats. One dcaihetounton: study (celine & Sutton, 1979) explored the 
possibility of using citizen ratings of police officers, but this 
approach appears to be plagued with methodological problems fe. g. ’ 
standardization of ratings) and cost problems. 

“Two poliice-selection studies (Bass et al. 1954; Henderson, 1979) 
used peer rankings as a alternative to, or in addition to, .superviso 
ratings. Two other studies (King, Hunter & Schmidt, 1980; Lave, 198 ) 
examined the validity of peer rankings or oe in police-petformance 
evaluations. King, et al. found that peer rankings were correlated with 
supervisor ratings and rankings. Love (1981). found that peer rankings 


and ratings were both correlated with supervisor rankings and ratings. 


‘ However,. another report of the study (Love, 1983) indicated that high , 


peer Suns were also correlated with more on-the-job injuries. Love 
(1981) also found that, the officers tended to dislike making the 
Yatings. | 

; ‘ Non-police studies on the validity of peer ratings as an outcome 
measure have produced mixed results (Gruenfield, 1981). 
: °°" Two developments in rating formats, behaviorally sachoued rating 
scales, (BARS) and paired comparison,\ have Beene eke their way into 


> 


police~selection studies. 
BARS are scales developed so that each as a series of ranked 


behavioral (petformance) statements "anchoring" points along scale. The 
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process of creating these statements, as first described by Smith & 
Kendall (1963), is time-consuming and complicated, It was originally 
hoped that the scales would reduce hallo effects by making pores more 
meaningful. ; 
As yet, BARS have not clearly been shown in non-police studies to 

have any pivanbare that would. justify the time’ and effort needed to 
. create them (Gruenfield, 1981).™ The findings of two studies using 
police officers muggeat that BARS do not improve outcome measurement. 
Cascio & Valenzi as78) compared eight carefully created BARS with each 
other and 15 elective measures of police performance. The study found 
that the BARS had intercorrelations ranging from .84 to .91. This 
Suggested that the ratings did not diacelainate among different aspects 
of performance. The study also found that none of the BARS was ‘ 
ei guiticuntly associated with the best linear composite of the objective 
measures, as determined through multiple regression. This suggests that 
the BARS lacked of validity. ° : 

| Landy, Farr, Saal & Freytas (1976) collected BARS data in 58-police 
agencies. The overall results were that median intercorrelations among 
the BARS were about the same as jaterrater reliability. Heanivig that 
scale ratings for an tndividual could be pred d nearly as well from a 
rating on another scale as from another Sg on the first 
scale. Hence, for an individual subject, any difference Siieee scale 
scores was almost meaningless. 


Rankings have been used instead of ratings in police-selection 


studies (e.g., Azen et al., 1974) in an effort to increase variance in 
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subject scores and, thereby, increase validity. A police study by Love 
(1981) provided some evidence that rankings are more valid. 

Three type of rankings have been examined with police: paired 
comparisons, whole-group rankings and nominations. Obtaining 
paired-comparison ratings involves requiring a rater, uwastls a 
supervisor, to compare all possible pairs of subjects on the variable of 
inscewaes A total score is then derived for each subject, such as by — 
adding the number of times he or she was named as superior (Fabricatore 
et al. 1978). P 

Whole group renkings have also been used (Mormon, Hankey, jbiddle & 
Goldwhite, 1967) for this purpose, but not recently. This lack of use 
may be the result of concern about the appropriateness of using ordinal 
data in multiple dep cexeian analyses, which have become commonplace. 
Although some couevovedey remains, — appears to be growing 
acceptance of quantitative analyses of this sort with ranked data 
‘(Roscoe, 1975). . 

Nominations are sintias to whole-group rankings except that each 
rater names only the few.top performers in order of quality’ of 
performance. That saves raters from having to give low ranking= io 
anyone. Total scores are based on all the ratings for each individual. 

Gruenfield (1981) suggested cs. of the problems associated 
with ratings occur erin the raters lack (a) training in making 
ratings, (b) information on the subjects, and (c) incentive for making 
valid ratings. These problems generally have been ignored in 


a 
police-selection research. 
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Alpha inflation 

| The most conspicous statistics problem in police-selection research 
involves ignored alpha inflation. Alpha inflation means that as either 
selection or outcome variables exceed one, the likelihood of any single 
correlation being significant at p < .05 or .01 because of chance 
increases. In other words, type 1 error becomes more likely. For 
example, Speilberger, Spaulding, Jolley & Ward (1979) apparently 
examined bs scales of the Strong-Canpbell litera’ Inventory and found 
for nailer that four scales were associated with resignation without . 
being eligible for rehiring. One would expect about that number of 
"Significant" correlations by chance. 

A related problem has occurred with the use of multiple regression. 
For example, McDonough & Monahan (i895) -~cenorked: a aultiple regression 
everelation of .50 between predictors and a performance rating, with 92 

“subjects. The report of the study expressed surprise that the 
correlation would be this high when the criterion rating had an 
interrater reliability ok woly: -41. The ratio of 54 predictors to 92 
subjects suggests that the multiple R was primarily the result of alpha 
dabtetion Multiple regression significance tests control for alpha 
inflation, but none vas reported in the study. A sore subtle type of 
MEE APE HERR eER ted alpha inflation occurred in the study of Spencer & 
Nichols. (1971) which examined dozens of selection variables. Only the 


top few were poeawed into a multiple regression analysis, ‘thus allowing 


calculation of a misleading significance level (see Nunnally, 1978). 


in 
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There are relatively simple solutions to these problems. First, 
statistically control for alpha inflation whenever multiple selection or . 
outcome measures are used end report the sein significence levels. ; 
Second, if in an unusual case alpha inflation cannot be statistically 
einceatied for, a statement of the problem would suffice. Third, q 
whenever possible, cross validation or replication should be atteupted 
to confirm significance. 
Instability Of er Weights 


Io multiple regression police research, much is often made of a 


aA 


* 


prediction equation found to predict outcomes relatively well. The 
exact equation may be used in other research (e.g. Mills & Bohannon, 
1980) or conceivably in making hiring decisions, If the equation is ; 
cross-validated, this may be eannnable: Otherwise, beta weights should | 

be interpreted and used with caution, as thie can be quite unstable, 

especially if the predictors are highly intercorrelated (Kerlinger & 
Pedhauzer, 1973, p. 77). F 
¢ Dawes (1979) convincingly sumaarized evidence that hapecine linear 

models (with randomly selected weights) succeeded as well in cross 

validation as did proper beta weights. The article recommended the ‘ : 
"simplest approach: ignore beta weights and assume each selection 
variable has the same weight. Cattin (1978) pointed out, however, that 
proper beta weights are superior when (a) the ratio of predictors/N is 
small, (b) there are substantial differences among the weights, (c) the 
R squared of the regression equation is relatively high, and (d) 


predictors are hot highly intercorrelated. 
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Moderator Variables 
, There is controversy over whether moderators are useful an 
police-selection research. Moderators are variables aie we not 
predictors but which increase the predictive ability of predictors. For 
example, Bachr et ai. (1973) found that when data for black and white 
officers were analyzed separately the correlation between predictors and . 
outcome variables was substantially higher.’ Speilberger, Spaulding, 
Jolley, & Ward (1979) found substantial ae eckiens in which selection 
variables were useful when males and females were separated. 
= Schmidt and Hunter (1978) argued, however, that variables such as 
race and sex have no intuitive appeal as moderators. On a rational 
basis, one might ask why would a selection variable (e.g., intelligence) 
predict performance only with whites or only with males. On a 
statistical basis, one’ must ask yhether group differences in validity 
are meaningful. Schmidt, Sernny & Hunter (1973) examined all single 
group police and non-police studies on the role of race in selector | \ 
validity, and toncluded that not one of the studies found a difference } 
between groups large enough to be statistically significant. @They noted 
that large differences usually are the result of chases variation 
especially likely to occur when one of the groups has a relatively low ; - 
number “a subjects. Hunter, Schmidt and Hunter (1979) later used a 
meta-analysis to examine dozens of race-moderator studies and found no ‘ 
overall significant distevence in validity. Neate, there appears to be 
little or no statistically sound evidence to support a role for 
moderators in “selection research with police officers or other workers 
(Dunnette & Borman, 1979; Ghiselli, 1966). ~ 3 
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A. final note on this subject has to do with the lega] requirements ~ 


a » 


for the use of moderators. Some of the impetus for the use of race as‘a 
moderator has come from misinterpretations of federal law. Baehr (1979) 
and Pacts (1977) stated incorrectly that regulations issued by the 
i Equal’ Employment Opportunity Commission under Title VII of the Civil 
Rights Act-require that selection measures be validated for both whites 
if and minorities. In fact, the regulations require such validation onl 
2 if the measure is used so as to discriminate against ‘minorities and only 
a ; if technically feasible. One would think that it would be ‘s rare case’ 
in which both qualifications are met. The use of formal ‘and informal | 
quotas prevents discrimination, and technical feasibility of dual 
validation rarely exists because even with qutoas few numbers of ae 


~ 


minority groups are hired. 


‘Findings of Police Selection Studies 
The review of findings willbe presented according to type of! | , 
selection variable, The types included are biodata, measures of 
-" intellect, measures of: personality and psychopathol 8ey, interviews, 
interest inventories, and subjective background ratings. 
“NGituational tests, although used in some police departments (Mills, | . 
McDevitt & Tonkin, 1966), are omitted because no research relevant to. 
their validity has apparently been eoondtad: Polygraphs likewise have . 
-—~-worteen: empirically evaluated as employee sélection devices: For — ee 
different views of the ability of polygraphs to detect lies, see Sackett 


and, Decker (1979), Lykken: (1979), and Podlesney and. Raskin (1977). 
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Some published studies are excluded from the review of findings 
because they had either inadequate research designs (Cross & aman, : 
1951; Finnigan, 1976; and Teeth, 1972) or no meaningful vectouuinee 

* » outcome variable (Flynn & Peterson, 1972; Hogan, 1971; Milles et al. 
1966; Mormon, Hankey, Heywood & Liddle, 1967; Mormon, Hankey, Kennedy & 
Jones, 1966; and and Mullineaux, 1955). Also exluded are published 
“information as to be uninterpretable (Cascio & Real, 1979; Colarelli & 
phegen, 1964; Furcon, Froemel & Baehr, (1973; Levy, 1973; Mills & . 
Reraktuns 1982; and Spencer &: Wehbe 1971). a, 

It is worth noting that many of the research reports examined for 
this review failed to provide even the most rudimentary listing of 
selection variables tested (e.g. Spielberger, Spaulding, Jobley, & Ward, 


1979). Too common are the presentation of nonsignificant correlation 


coefficients without significance levels (e.g., Henderson, 1979). and the 


presentation of multiple regression statistics without O-order ~ 


correlation coefficients (e.g., Fabricatore, Azen, Shoentgen & ar ae 


1978). he a eal +5 ee 


All reported associations were significant at p “ .05 or better 
unless otherwise indicated. . 
Biographical Items (Biodata) rg 


For the purposes of this review, biodata are items concerning an 


” 


‘individual's tife experiences. Although some researchers have referred 


‘ to data such as written responses to questions regarding motivation as 


being biodata {e.g., Levy, 1973), these are more properly cir ahaa 


as personality or pefhological-state measures. 


s 
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~. reports of studies that are so lacking in. statistical or other importapt | 
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Because of the pature of biodata, most the studies used 
thea rien designs. a studies that used biodata collected after 
- the subjects were hired ‘will be noted. , 3 
; Ta “considering the findings, one should keep in ming that many of . #é 
the. ites may have been found to be uncorrelated with saccoeed in r 
studies which did not report nonsignificant biodata. (e. a Levy, 1967; teal 
er Speilberger, Spaulding, -Jolley,~& Ward, 1979). A unter of specific 
items listed above under Unlawful Selection Variables will be ignored 
| because their use in selection would or night ‘ba unlawful. 
Age. Bartol (1982) used a group of officers divided into ‘acu 
rated by supervitors ap average, below average and above average to 


-assess the relationship between several predictors, including age-as a 


.- ‘continuous variable, aut supervisor ratings. . The study found that the foe i 


fatowensirvak officers were older when hired than the other two groups. au . 
Tiere was no significant difference between average and above average. 4 
Five studies found no evidence that age at hiring was a valid | > & 
; predictor of police PSSSELRANCR: Levy (1967) asaessed age divided ‘into 
ranges of under 24, 24-26, 27-29 and 30-above and found no association 
with involyntery termination. Azen, et al. (1974) assessed age as a 
continuous variable aiid found no correlation with voluntary termination 
within two years. McDonough & Monahan (1975) found no essocistion 
‘between age as a. continuous variable and resignation, being fired, or 
— promoted’ ets two years. Gottlieb & Baker (1974) found no 


association between ie as a continuous variable and supervisor ratings. 


Marsh (1962) assessed the validity of age with unspecified age 
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categories and found no association with supervisor ratings, being 
discharged or auto accidents. | . 
Snibbe st al. (1973) used 95 of Mersh's, 550 subjects 10 years later 
in a follow-up study and found that younger applicants were more likely 
to be promoted beyond patrol officers. There was no association between 


~ 


, age and supervisor ratings, auto accidents, workers’ compensation claims 
or injuries. — >, oat x 

.One study found evidence that officers who were older when hired 
performed better. Cohen & Chaiken (1972) used 14 outcome measures to 
examine age in terms of three categories: 18-24, 25°29, and 30-above. 
The study found that being older was associated with few total 
complaints, few civil complaints and little “absentecian, but also with 
no career advancement. 

Overall, the findings regarding age are wildly mixed. Older 
applicants may be better, worse or the same as other applicants. The” 
only replicated finding is that police officers who were older when 
hired were less likely to be promoted, even when they otherwise 
performed as well. as or better than the other officers. This merely 
raises doubts shutie the meaningfulness of promotion as an outcome 
variable. . . 

Court Appearances. Cohen & Chaiken (1972) found an association 
bemoan appearances in civil court prior to being hired and harassment: 
Considering that 14 outcome measures were used in the stady, and alpha 
inflation was not controlled for, one should put little confidence itn 


this finding. The study also found that receiving a summons to testify 
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in, a civil ccsceediae before being hired was associated with later 
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departmental awards. This predictor was also associated with beiag. age 


tried for misconduct and substantiated cases of misconduct. However, 
the relationship with these two outcomes was such that officers who had 
received one summons to testify had far fewer of the two aauanies 
outcomes than did officers with no summons or more than one summons. 
This sort of relationship is difficult to interpret as being other than 
a chance finding. - 

Criminal convictions and arrests. Levy (1967) found that vehicle 
code violations and more a offenses prior to being hired were 
associated with Suveluntery termination. These findings were 
cross-walideted in the study. Cohen & Chaiken (1972). found no 
association between petty offenses and any of 14 performance measures, 
but did find that arrests before becoming an officer were associated 
with fewer harassment charges while an office. Because this finding was 
in the unexpected direction and 14 outcome measures were used, one 
should i cautious in placing any confidence in it. 

Debts. Cohen & Chaiken (1972) and Levy (1967) examined the 
validity of number of debts and both studies found no associations with 
performance. ; 


Education. The most commonly studied biodatum in police-selection 


studies is amount of education. Because many police departments require | 


at least a high school degree iieaavin; 1974), education usually nae 
been studied by evaluating the association between performance and years 


of education past high school. 
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Two studies found evidence that more:education is associated with 
better performance. Cascio (1977) sided the predictive ability of 
education with a group of deputies divided into 825 whites, 60 blacks, 
and 55 "Spanish-surnamed." Inexplicably, two orientals were treated as 
"Spanish-surnamed." Forty-four ousceas measures were used, including 
BARS which assessed job knowledge, initiative, dependency, attitude, — 
relations with others and communications. The other measures were 
objective measures, of which the report listed only the ones that 
correlated significantly with education. 

For the white officers, .13 meaningful objective measures were 
associated with education. More education was associated with a lower 
incidence of 12 negative outcomes: injuries, injuries by assault and, 
battery, disciplinary actions due to accidents, verbal discourtesy 
allegations, preventable accidents, use of force, internal reviews, 

- legal investigations, "personnel" complaints, false-arrest allegations, 
physical force sa tcevidene. weed times sick per year.) However, more 
education was also associated with fewer commendations. | 


«... For the black officers, more education was associated with fewer 


injuries, fewer preventable accidents, flewer injuries, f¢wer physical 


NN 
\ F e 


force allegations and more commendations. 

eae: With. the "Spanish-surnamed" subjects, Tenia two significant 
correlations, a favoring more education (association with fewer 
preventable accidents) and the athes cutting weainet it (association 


with more false-arrest allegations). 
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Hence, there was considerable evidence with regard to white m 
officers and black officers that more education led . better 
performance. This was not true with regard to "Spanish-surnamed" 

1 Officers. As was discussed above under moderator variables, this is 
sadihely to be the result of actual differences in validity. 

Cohen & Chaiken (1972) examined education with 14 outcome variables 

; and ‘found that tio Of the outcone measures were associated with more 
education: career advancement by civil wervies exam and lack of etait? 
complaints against the officer. 

Two studies found that more education was associated with poor 
Rerformance. Levy (1967) using giueatienal tavels ranging from less . 


‘than 11 years to more than 17, found that of the officers hired in the 


past 10 years,: those who were reta ned had substantially less education 


than those who were fired. The study cross-validated the finding. 


Gottlieb & Baker (1974) found that college education was assotiated with 


low supervisor ratings. 


Overall], the results are-wildly mixed as to the validity of. 


education as.a predictor. of police performance. This conclusion is in 

line with that of Caplan & Schmidt (1972), who reviewed the research ‘on 
es “education as a predictor of performance in jobs generally and found) : a 
‘Jattle evidence of validity.: The contradictory research findings gins 
fit well with conflictng conceptual arguments (see Lefkowitz, 1977 & 
- Sparling, 1975) that pies iepeiads would be positively asspciated we. 

performance (e.g., through added understanding of human behavior), 

negatively associated (college graduates gis find routine work 


unchallenging) or unrelated. . 
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Peperience as a police officer. Leiren (1973) found -an association tie 
between a experience as a police officer and high supervisor . 
ratings, but no relationship with aigenten Gen commendations from the 
public, and auto accidents. Considering alpha inflation, sng can put 


i . : “se 
little confidence in the one "significant" finding. 


‘% 
% 


Caplan & Schmidt (1977), in reviewing employee selection in 
a : general, found little evidence of, validity for prior experience. | 4 
. ee terminations. Levy (1967) found that having ever been 
fired in a prior job ‘ine positively assgetated (p < .01) with 
involuntary termination as a police officer: The study cross-validated 
the finding.. 
Marital status. Levy (1967) found that officers who had been 


married at least twice were more likly to bé fired. This finding was 
cestaceieanend da the study. Azen et al’ (1973) found no effect for 
“number of usiaecs with regard to decision within two years, 
shanntnelen, supervisor ratings in absolute and paired-comparison 
formats. ‘ 
Pode studies found that the variable of being married or venaceted 


at the tige of hiring was unrelated to performance as a police officer 
(Cohen & Chaiken, 1974; Gottlieb & Baker, 1974, Levy, 1967; and ’ : 
Speilberger, Spauldine, Jolley & Ward, 1979). 

Military disciplinary actions. Cohen & Chaiken (1972) found that 
military disciplinary actions were positively associated with complaints 


) against an officer, departmental trials and substantiated complaints. but | 


not with 11 other performance variables. 
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; ; Military Experience. Azen et al, (1974) found that prior military. 


°, experience was associated with continuation with the department for at’ 
l@ast two years. However, Cohen & Chaiken (1972) found no association 
. with any of 14 performance variables, Levy (1967) found no association 
with involuntary termination, and Gottlieb & Baker (1974) found no 
association between prior military experience and supervisor ratings. 
McDonough & Monahan (1975) found no association between "type of 
military nexvies" and resignation, being fired or being promoted within . 
two years. _ . 
Participation in athletics. Speilberger, Spaulding, saltey: & Ward 
(1979) foutid that for white males, lack of participation in athletics 
was associated with termination without being rehirable. For females, 
\there was no saeoebatiens Shealy (1976), however, coud a positive | 
association becween prior participation in athieviés ond supervisor 
evaluations: that an officer was corrupt. stance, as with ropard te. 
S siuention and age, the results have been inconsistent, making a 
/ : conclusion on the value of the duedbutee impossible at this point. . 
Prior jobs. Levy (1967) found that bong duration of prior jobs was 
associated with staying on the force as Sppesee to being fired. This J 
finding was cross-validated in the study. Cohen & Chaiken (1972) found | - =“S 
no association between long duration of prior jobs and any of 14 ou come ae 
measures. | \ 
Cohen & Chaiken (1972) and Levy (1967) also investigated the 
P24 predictive validity of status of prior occupations arid nunber of prior 


jobs. Neither: study found any association with outcome measures. 
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Training in police science.:-Levy (1967) found no association 

between having tikes police science courses and involuntary termination... 

Psychological disorders. Cohen & Chaiken (1972) found no 
association between reports ‘of prior psychological disorders sad any i « 
14 outcome variables. 

Summary of biodat . Overall, the studies provided encouraging 
empirical evidence that five biodata are associated with poor 
performance by police officers. The variable of prior Anvoluntery 
terminations was cross-validated as a predictor of involuntary ‘ 
Eormionbion as a policé officer (Levy, 1967), and no contrary findings 
have been reported. Criminal convictions and vehiclercode convictions, " 
having been married more than once, ind short duration of prior jobs 
were all cross-validated as peadictors of involuntary termination (Levy, 

1967), but subsequent studies of éach failed to replicate the findings. , 

It seems likely eta ; combination of these five biodata would be 
more successful than any one item in predicting valiceotticnes . 
performance. This conclusion is consistent with the conclusion of : \ 
reviews of non-police validity studies that groups of items can validly 
predict job performance (Dunnette & Borman, 1979; Owens, 1976; and 
Tenopyr & Oeltjen, 1982). 


* Measures of Intellect 


Pre-selection validity. In one of the seven reported pre-selection 
studies of published intelligence tests, McDonough & Monahan (1975). 


assessed the validity of the Otis Intelligence Test and an unpublished 
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civil service exam by pene groups of fired, promoted and 
schegexiuted officers. ‘There were two significant group differences. 
hen tests discriminated between fired and promoted and between 
non-promoted and- promoted. However, supervisors were aware of the 
phanee. ee outcome-variable contamination may have contributed to the - 
resylts. Also, the report did not say whether promotion was based on 

¢ the results of promotion ere on which more intelligent officers would- 
likely do better. 

Blum (1964, study 3) assessed the preselection validity of several 
measures of antelteck; including the Qtis, Army Alpha form SB, Army 6 
Alpha (Intelligence Test), Terman-Herrill Intelligence Test, Moss Social 
Intelligence Test, Moss Mental Alertness Test’ and 0° Keune? s Policeman 


* 


Aptitude Test. The study found no association between any of the tests 
i . 


sh being fired or promoted. 
Marsh (1962) assessed the pre-selection vahieiig ofa civil service 
exam and its parts: word memory, sentence suapietaee, number series 
completion, arithmetic reasontng, cube and block counting, prackiest 
judgment and memory. , The study used as. outcome measures departmental 


| 


rating and number of auto accidents. The results’ were that high scores ‘ 
on the overall test on‘sentence completion and on number series. a, 
- completion were positively associated with high departmental peeings, 
and that no correlations were found between test Scores and auto 
accidents. . 
Snibbe et al. (1973), using 95 of Marsh's subjects ‘10 years later, © 


‘assessed the pté-selection validity of the total test score. The study 
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found a positive association between high test scores and high rank but 
no association with whether still on patrol, supervisor ratings, auto 
accidents, workers* compensation claims or injuries. 

‘Dubois & Watson (1950) assessed the pre-selection validity of the 
Army General Classification Test (First Civilian Ed.) and found no 
correlation with departmental rating. Gottlieb & Baker (1974) 
investigated the pre-selection validity of the Schrammel General Ability 
Test and found no association with supervisor ratings. 

Post-selection predictive validity. Cohen and Chaiken (1972) 
studied the post-selection predictive validity of the Otis Intelligence 
Test and found that high scores predicted promotion by sudevedtese. bit 
were unassociated with 13 other performance variables. : 

‘Speilberger et al. (1979) assessed the post-selection predictive 
validity of the Nelson-Denny Reading Test with groups of females and 
white males.. Although the test is not a measure of intellect, it is a 
test of maximum performance like intelligence tests, and the study, . 
therefore, will be discussed here. | Termination during the vighahtoanky 
period without being eligible for behivios was the sole criterion. For 
white males, there was an sasveiseten between termination and low scores 
on the total test, on comprehension,-on vocabulary and on reading rate. 
For females,- there: was-an association between termination and low 
comprehension. | | | 

Blum (1964, buds 4) pieudacsl the post-selection predictive 
validity of the Otis eich meme outcome measures: days lost due to 


illness, periods in which sick leave was taken, minor disoiplinary- 
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charges, injuries, accidents, serious charges, commendations, and 
citizens’ commendation. The study found that high test scores were - Fi 
! associated wits ‘ae denertnental commendations, but also with more auto 
“accidents re injuries. . 

Blum (1964, study 2) investigated the post-selection predictive 
validity of the AGCT and the McCardell’ Test of Practical Judgment and 


- found no association between scores on’ either test and departmental : 


* 
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ratings on character and’ performance. 
Concurrent validity. Henderson (1979) investigated the concurrent 
© 
validity of three measures of intellect’ in a two-part study. In one 


part, 151 officers took the Culture Fair Test and the numerical and 


¢ 
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i ; verbal subtests of the Differential Aptitude Test as volunteers. , In the 
“ether part. of the study, 234 officers took the culture Fair and the ’ 
nume’ cal and verbal subtests of the SRA as park of sia promotions. 
For. both parts of the study the outcome: Wariabies were eatlags by 
superiors and by peers. The results were reported without significance 
levels. We have determined the significance of the correlation 
coefficients through a table provided by Suadacor & Cochran (1967) and 
found that in each study high scores on the numerical test were 
associated with high ratings by superiors and peers, but that scores on 
the Culture Fair verbal tests were not associated with superior and péer 
ratings. : 

Bass, Karstendiek, McCollough & Pruitt (1954) assessed the 


concurrent validity of two Moss tests (Social Intelligence and Memory),- 


the Wonderlik Personnel Test, and two perceptual flexibility tests by 


* 
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correlation was nonsignificant. 


- 


Thurstone (Gestalt Completion and Hidden Figures). The data for two 
groups of deputies from different agencies were analyzed separately. 
The outcome aduxiite was a rating by the chief fie bie department and 
peer ratings in the other Seceotnand, ‘The report provided zero-order 
correlations without significance levels. Conversion of these into 
significance levels with a conversion table (Snedecor & Cochran, 1967) | 
shows that none of the test scores were associated with performance . 
rating. 

Leiren’ (1973) evaluated the eanevenent validity of a civil service 
exam and found no association with letters of commendation, number of 
auto sceidente, supervisor vaitngs or absenteeism. Cascio (1977) 
investigated the concurrent islidity of the California Mental Capacity 


mt 
Questionnaire, using 15 objective performance measures. The multiple 


Experimental tests. Leiren (1973) investigated the concurrent _ 
validity of seven experimental measures: of intellectual kunobiontin with 
four outcome measures. The findings were that high verbal~reasoning 
scores were associated with commendations, few accidents, and high 
Supervisor ratings, that high numerical~reasoning scores were associated 
with supervisor ratings, and that high vocabulary scores were associated 
with few accidents. The other 23 correlations were nonsignificant, 
including all involving the following tests: spacial visualization, 


“best trend name," and “letter triangle." Absenteeism was not 


associated with scores on any test. 
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Crosby, Rosenfield & Thornton (1979) developed a test intended to 
‘assess several aspects ‘of intelligence and tested the concurrent 
validity of the test in four police departments. In each department, 


_* the test was assessed against supervisor ratings on 15 dimensions. The 


Q 
« . 


only results reported were in terms of number of significant 

correlations between high test scores and high ratings, listed according 
to department: (a) all 15 correlations significant; (b) 10 significant; 
(c) wits signifitant; and (d) 3 significant in the unexpected direction. 


Using 30 officers, Martin (1923) assessed the concurrent validity 


» 


of eight experimental measures of intellect, a civil service exam and 


1 


two parts of the Army Alpha Intelligence test (opposites and comnor 
} a 
| sense). The only finding reported is that a multiple regression 


equation based on seven of the tests produced a R-squared coeffitient of 
-58. No significance level was reported. Using a formula for testing 


R-squared (Kerlinger & Pedhguzer, 3973, formula 3.12), one finds that 


¢ 


the R-squared was not significant. 


' 


Summary of measures of intellect. It might well be thought that 


the key stuBies were the four that used a pre-selection design. Only 


- 


McDonough and Monahan (1975) found. that high test scones predicted a 


good outcome and that was promotion. Oné must suspect high test scores 
led to promotion in the study of McDonough and Monahan because (a) 


promotion was based on the scores, or (b) promotion was based on later 


be * 


tests, and officers who did well on the pre-employment tests also did 


e 


well on the promotion test. Hence, the ability of an SUtELEgAKGS Leet 
to predict profiotion in the study should not be interpreted as 


validating the test, 
— 


. 
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The post-selection studies of published tests also provided little 


wv 


evidence of validity for measures of intellect. _ Only eight of $2 —_ . 


eT bs Md 


correlations calculated for various outcome measure showed a positive, 
significant association between high test scorer and good performance. 
Further, there were two correlations showed an association between high 
test ichven and poor performance. Thus, one is drawn to the tentative S. e 
conclusion that measures of intellect are not valid predictors of police sage 2 
performance. This conlusion stands in conkraat to that of Ghiselli 

(1966, 1973), who reviewed a large number of studies done on the 


- 


validity of measures of intellect as predictors of performan e in a wide 
array of jobs. The reviews concluded: that ‘there is eit ia o& ig 
for the general validity 6f the measures. However, it is possible that 
che job of. a pdlice officer is different ‘from other jobs in ways that 
sis high sncel Uigniicn test scores unimportant, »ipe both good and bad. i or. 
In the future, researchers could produce optimally useful findings 


by evaluating,only tests that have been published, with demonstrated 


reliability and validity as a measure of intellect or a related 


comstiuct. 


- + 
MHP]. The HMPI is the sae measure of personality and psychopathology 


that has been évaluated in more than one pre-selection study, $0 the ne 


. 


MHPI will be geviewed before similar measuees— i 
The most recently reported, MMPI suds was done by‘Bartol (1982), ' : 

who assessed pre-selection wlidity with ratings by the chief of police "4 

as to whether officers were average, Diavavdeteae or below-average. Of ‘ 

the 13 HMPI gceles, only one discrimnated ationg groups.» Subjects with enn 
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of the! MMPI with. three outcome measures: 
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high Mf scores were more often rated as below average than average or 


s ’ 
above average. 


The study atae compared the groups without the usual K-corrections 
on Hs, Pd, Pt, Sc and Ma. Without the corrections all five oe the 
scales Siweedaianved Yetwarn average and below and between above and 
below averhge. None diecriataatad between average and abéve average. 

Bernstein, Schoenfield & Costello (1982) evalunted the 
pre-selection validity of the 13 validity and citnicat MMPI scales. and 
also the 2 scale. The satis provided only statistics showing the 
multiple regression eeceelerenn between all the MMPI scales and each £ 


ak 
outcome measure. Therew were significant multiple EOREBLEREINE for the 


14 scales in predicting four cerhicnsice variables: disciplinary days, 


“citizen complaints, sick days and injuries. There was no significant 


multiple correlation with disciplinary actions, grounded citizen 


complaints, chargesbis auto accidents, or avoidable auto accidents. 


Saxe & Heiser (1976) assessed the wea natection yalidity of the 


MMPI and found that high scores on Hy and Pt were: associated with 


zg * 


voluntary termination within three years. However, low scores on L, K 


. 


and Pa were also associated with termination. 
~—HeDonough & Honahon (1975) assessed the pre-selection validity of 
the MMPI Scales and found no difference between groups of fired, , 
A al aaa resigned and promoted officers. 
Azen et ab. (1974) assessed the predictive post-selection validity 


resignation during training: or 


within the firet two years vhevedtter, an absolute rating by supervisors 
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and a paired-comparison rating by supervisors. Only high Mf scores were 
associated with resignation. the officers were divided into two groups 
for determining the correlaton between scale scores and ratings. In 

one group, high paced Ma were Seecntated wath low absolute ratings by 


supervisors but not with low paired-comparison ratings. There were no 


4 
{ 


. significant correlations for th: other group of subjects. 
Blum (1964, study 4) assessed the predictive post-selection 

validity of the MMPI with 10 outcome measures: departmen i 

| commendations, public CORMICHOSKLONE charges of misconduct, 
‘substantiated charges of serious misconduct, charges of minor 
misconduct, auto accidents, injuries, number of periods when sick Shave 
siheiy socal days lost due to illness and ratings of assignment 
avopreaston, Several associations were found; all were between high 

: scale scores and negative outcomes. No significance levels were 

aeoutand in the PERE, but stinks can be determined ‘through a table -for 


translating correlation coefficients. into significance ‘levels (Snedecor 


& Cochran, 1967). The significant associations were F with charges of 


winor misconduct and substantiated charges of ‘serious misconduct; Mf 


with periods .in which sick leave was taken; Pa with substantiated 
4 ‘ 
charges; Pt with megs of minor misconduct and substantiated charges; 


‘Sc with charges 6f minor misconduct, serious charges ‘and UnStSnAsEe 


charges; and Ma with charges. of minor misconduct and substantiated 
“it charges: of serious: misconduct. oe - 3s 
j \ 


Schoenfield? Kobos & Phinney (1980) assessed © ‘the predictive 


pust+selection validity of the MMPI by comparing scale-scores of 23 
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officers whom supervisors would not rehire and 46 officers who were 
rated acceptable, had a commendation and had no disciplinary actions 
against them. There were no significant differences between groups on 


any scale. The study also assessed the ability of two psychologists to 


distinguish between the two groups by clinical interpretations of MMPI 


scores. Neither psychologist identified members of the two groups with 
accuracy greater than chance. , 
Costello, Schoenfield & Kobos (4982) used the same two groups and a 


4 


randomly selected intermediate group of 92 officers to assess the 


_post-selection predictive validity of the Goldberg MMPI Index (L + Pa + 


“Se + Hy.+ Pt). The results were that the index discriminated between 


the two extreme groups but not between either of those and the 
intermediate group. 


Merian, Stefan, Schoenfield & Kobos (1980) used the data for the. 


. same 92 officers to assess the: predictive post-selection validity of the 
566 MMPI items. The results were that five items were cross-validated 
at 

‘at p< .10 as discriminating between the low-rated and exemplary 


‘officers. On the basis of chance, one might expect about 50 items to be . 


significant. in one comparison and then 5 of those to be cross-validated. 

Hence, the finding suggested that MMPI items were not. valid predictors 
. & 

of performance.. 


Speilberger, Spaulding, Jolley & Ward (1979) assessed the , a. 


_ predictive. post+selectiom validity iof. the MMPI -L- scale ahdvan' + 4s 


experimental sociopathy scale made up of MMPI items. The study found no 


ae ! 
assocgation with termination without being eligible for rehiring. 
te * ~ ao : ‘ 
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Blum (1964, sudy 2), studied the wget Suet selection validity 


ee ree eee A 


_ of clinical judgment of MnP profiles. The outcome aps was derived 


| 


by dividing subjects into| pwr, medium and good performance groups es 
| 
. On dnpacisiettal ratings, commendations and discipYinary actions. sh 


A 
| 


results were the opposite of those eepected: sees of failure 
were associated with better performance. : 
Overall, not a singe scale was found to be valid in more than one 
“pie-geteckion study. Mt was found to be a valid selection variable in 
-three studies, two of which’ were post-selection studies. High scores of . 
Mf were associated’ with low supervisor ratings (Bartol, 1982), 
resignation (Azén et al. » 1974) and number of persods in which, sick 
leave was. taken (Blum, 1964, Study 4). One can only speculate’ whether ; - 
high Mf officers (artistic, creative, ettewineké) pertorned poorly ‘or E 
were just out of place in a tougti-guy police subculture. At any nibel 
the scale deserves further validation research. Scales Pa, Re and hg) 10 os 
are also deserving of further evaluation, as they showed validity in - . 
more than one study. 4 | 
It is unclear what to make ve the finditg of Bartol (1982) that HS», (. ¥ 
Pd, Pt, Sc and Ma were valid predictors ies if not K-corrected. It 
could be that’ K-corrections tend té mask valid predictor information in 
the sexie scores. Researchers might profitably explore this ayenues rr 
Four scales were shown repeatedly to lack usefulness: by ae 


K-corrected Hs, D. and So. . These scales would seem undeserving of 


further validation attempts. 
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peteenwiey and Psychopathology Measures Other Than the MMPI 
cali fora nia Personality siden (CPI). McDonough & Monahan (1975) 


udevestie pre-selection validity of the 18 CPI scales by comparing 


groups of officers who, two years after being hired, had been fired, had . 


resigned, had been promoted, or had not been preneced A total of 80 
group comparisons were made. ~Of these, only two were significant’ at p< 
«05. High Responsibility and Socialization scores were associated with 
being fired as opposed to being still ativan, These findings were in 
-the anecnecked direction, but in view of ‘hie nie of correlations — 
done, one would expect two to be significant at p < .05 by chance. 
Speilberger, Spavlatne, Jolley, and Ward (1979) assessed the 
predictive nosteenlection validity of the CPI. The study found that low 
seeeee ane the following scales ‘correlated with termination without being 
eligible for rehiring for white males: Dominance, Capacity for Status, 
Sociability, Achievement decoueh Conformance and Intellectual 


Efficiency; for, females: “Capacity for Status, Well-Béing, 


Responsibility, Self-Control, Tolerance and Good Impression. ; Pa 


Mills ca Bohannon’ (1980) used PRR ST ratings. of leadership and . 


suitability Ae police work to assess the. concurrent validity of the 
tPr. The study found that Tolerance, Intellectual Efficiency and 
_ Achievement through Independence were positively associated with both 


outcome measures. Socialization, Communality and Flexibility were. 


. 


associated with suitability only. The other scales and boughs (1969) 


leadership index (.372 Dontakace + +696 Social Peccence + 305 


° 


Well-Being + 2274 Achievement via fadependence - .133 Good Impression) 


¥ 


were found not to be associated with either outcome variable. 
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Hogan (1971) ieceaeaa the concurrent validity of the 18 CPI heave 
and found that high scores on six scales were associated with good 
! supervisor ratings: Well-being, Responsibility, Self-Control, Good ‘ 
¢ Tepeession, Achievement via Conformance, huh euae via Yadmpendeniey 
i and Psychological Mindedness. | 
| Hogan & Kurtines (1975) assessed the concurrent validity of the CPI - 
| with a performance variable of number of disciplinary actions. :The 
; ‘report provided no O-order correlations regarding individual scales but 
did state findings’ regarding a social maturity index (Gough, 1966; 148 


Dominance + .334 Responsibility + .512 Socialization + .227 + 


Flexibility = 517 Good Impression - .274 Communality) and a Leadership 
index (Gough, 1969). The study found no association between atther 


° 


scale and number of disciplinary actions. 

Although several studies of the ‘CPI have been reported, only one a 
used a pre-selection design, and it Keoducel inconsistent findings. 
Several scales, including Achievement ‘via Conformance, Self-Control, 
Tolerance and Intellectual Efficiency, were found é have concurrent | 
validity in two studies, making them deserving of evaluation by 
pre-selection studies. 

Edwards Personal Preference Schedule (EPPS). hoes et al. (1973) 
investigated-thé concurrent validity of the EPPS scales and found that 
low scores on-one of the 15-scales, Integception; were significently.- Renee sat le Ps ah 
associated with termination during training or within, tro years 7 “ 


se 


thereafter. 
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Leiren (1973) also assessed the concurrent validity of the EPPS 


scales. Four outcome variables were used:' Supervisor rating, 


absenteeism, Vethets of commendation (source unspecified) and number 
auto accidents. - Only one of the 60 correlations was significant. High 
scores on scale F I (aggressivley self-assured) were associated with 
high gupetulser ratings. . 

Henderson (1979) assessed the concurrent validity of the EPPS 
scales with groups of 75 white officers and 40 black officers. - The 
report provided correlation cout ficicats without, significance levels. 


_s By co ing these into significance levels through a table (Snedecor & 


- Bag 
e 


™“eothran, 1967), one discovers that none of the scales were associated + 


with either supervisor or peer ratings for whites, but for blacks high 


o 


Order and Autonomy scores and low Heterosexuality stores were associated 


é ° 


with high scores on both ratings. High endurance scores were also 


. 


associated with high supervisor /ratings of blacks. | . 
, In summary, three post-selection studies produced no sepikcaked 

evidence of validity for any EPPS scale. Therefore, the value of the 

EPPS in police-selection must be doubted. However, only oreveeleveisa a 


studies of the EPPS could provide the data needed for a firm conclusion 


ca 


on the usefulness of the scale in selecting police officers. 


16 Personality Factor Questionnaire (16PF). Fabricatore et al. ° 


(1978) evaluated the concurrent validity of the 16 PF by doing a 


al 


canonical correlation analysis with all 16 scales and four performance "4 
measures: supervisor's Se iedecoueaseatn rating, supervisor's absolute 


rating, numberof official reprimands and number of preventable 
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seeddents. The resulting correlation, R = .27, was significant. No 
Gecudey correlations were provided. eS 
Henderson (1979) ‘assessed the concurrent validity-of the 16PF with,” 

: 1 groups of 151 police-officer volunteers and 234 officers’ who completed 

: , the scale as part of screening for promotion. In the volunteer group, 
high Anxiety scores were associated with high peer ratings; for the j 
volunteers, high Brightness scores were associated with’ high supervisor aR 

! ; . ratings. Note that the first of the findings is soumentanibeive,. > 

3 Given a ia caveelenians were calculated, oid should put little faith 

P in the validity of the anxiety and brightness findings. 


The results of the two post-selection 168F studies were mixed, 


+ 


making pre-selection studies of the scale appropriate. bt oo 
: F Scale of Authoritarianism. Blum (1964, study 4) assessed the ele 


_ predictive post~selection validity of the F scale and found a 
' significant relationship between high F Scale scores and minor ; . 


disciplinary charges, days lost due to illness and low number of citizen 


. commendations.’ There was no association with number of auto accidents, 
: serious discYplinary charges, cases with substantial evidence of 
misconduct, departmental commendations, periods in which ill or number 


of injuries. - 


+ Bass et al.. (1954) found no association between F Scale scores and 


supervisor ratings ‘in two groups of officers. MéDonough & Monahan | i 


(1975) found no relationship with resignation, termination or promotion. 


nee Te 


The evidence of concurrent validity for the‘F scale is mixed, 


making pre-selection research appropriate. ; : . 4 
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. Rorschach and Draw-A-Person. peliqnsuph & Nonahan (1975) assessed 


none selection validity of a psychologist s vations: of EEEpORSES “to: ‘a 


group-administered Rorschactr and a Draw-A-Person. The study compared 


1 groups of SEnEM PORTERS promoted and non- promoted officers and found 


” 


that the Rorschach ratings discriminated between fired and promoted 


officers only.. Draw-A-Person rating failed to discriminate between any : 


two groups.” : a See 
Blum (1964, study 4) assessed the predictive post-selection eg 


validity of the same tests using his interpretations of responses. The 
interpretations were not associated with any of 10 outcome variables. 


The two studies of projective tests produced overall mixed evidence . “y 


‘ 


of validity for clinical interpretations of the Rorschach, suggesting . 
‘that further evaluation of it may be warranted. Further evaluation of 


the Draw-A-Person technique seems unwarranted, however,, 


Rotter Internal-External (IE) Scale. Speilberger et al. (1979) v1 
: ie dace. ae 
also investigated the post-selection predictive validity of the VE Scale 
' . 5 fi y i 
and found no relationship with termination of males or females without 


* . 
fg . 2 


being eligible for rehiring. Leiren (1973) found a concurrent — ; F 


relationship between high externalization on the Rotter IE Scale jand a 
_ more auto accidents, but no correlation with absenteeism, me 


of a 
- 


\ Ow an. 
recommendat ions\or supervisor's ratings. Together, the studies’ provided : 


only meager evidence that the IE Scale is valid as a predictér of al 


performance. “However, enig ‘pre-selection studies would a ow a firm a oe 


conclusion as to the validity of the scale. 
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Sornel Borg. Form 2°& Rosenseeia. Picture Fe Frustration Study. Dubois 


(direction of aggression and type of frustration reaction) and found ae 
relationship with departmental ratings. 
Gordon Person Profile. Bass et al. (1954) oventigated the 
concurrent validity of the Hypersensitivity, cendency, Sociability. and 
Responsibility scales of” the Gordo Paxman with rofile. The report 


provided Correlation coefficients without significance levels, but one 


can determine with a table provided by Snedecor & Cochran (1967) that 


the correlations were all nonsignificant. No, association was found with 


° 
. wv sd 
. , 


peer or supervisor ratings in a group of 22 city gfficers or in a group 

of 37 deputy sheriffs. | | 
Guilford-Hartia Temperament, Inventory. Marsh (1962) asareers the 

concurrent validity of the Guilford-Martin Temperament Inventory ana, 


found that, high Séocral Activity scores were associated with high 


\ 


supervisor ratings but ‘not with involuntary termination, or rate of auto 


accidents. None of the other four scales were associated with any 
outcome variable. Snibbe et al. (1973), using 95 of Marsh's subjects 10 
years later, assessed the validity ae the Guilford-Marting Generdl 


Activity Scale and found that scores on it were not associated with any 


e 
e 


of five outcome variables. 


Humm-Wadsworth Temperament Scale. * Humes & Humm (1950) assessed the 


post-selection predictive validity of ratings made- by “trained” 


psychometrists*on the basis of a number of selection criteria including 
y 


- . -, 


{. 


hy * 1314389 


ect ee 


* 
the Humm-Wadsworth Temperament Scale. Low ratings were associated with 
. involuntary termination. For lengthy crititisms of the study and its 


findings, see Blum‘ (1984, pp. 106-107) and Ruch (1965). 


investigated the concurrent relationship between scores on Rotter's 


Incomplete Sententes Blank and peer ratings and found no association. 


a eS ee ee ee ae 


* State-Trait Anxiety Scale. Speilberger, Spaulding, Jolley, and 
Ward (1979) ‘assessed the predictive post-selection validity of the 
State-Trait Anxiety Scale and found no association with resignation 


, 2 _ without being eligible for réhiring. 


& 


Experimental Scales. Shealy (1979) developed a social judgment: 


ee ee ee Se ee Re a 


scale and assessed its concurrent validity with an unspecified number of 
officers. The scale consists of 15 items intended to produce scores for 
moral knowledge, socialization, SAREE conscience (internal v. 
external rules) and Fexdutaies to peer pressure for immorality. The 
study Pound that low conscience scores were associated with dhupervisor 
sebings' We wereuptlon: Total scores and scores of the other variables 
were unrelated to the ratings. ; os 
Hogan (1971) developed an empathy scale and found no concurrent 

> association with Sb aah ratings. Bass, Karstendiek » MeCollough, and 

Pruitt (1954) assessed t e contilrtent validity ofan a scale. with 
22 deputies and. found no Lesotiation. with: supervisor ratings. 

Baehr & Fromme (1971) ‘assessed the concurrent validity of an 


experimental psychoanalytic instrument called the Arrow-Dot Test, which 


2 + 
* 


produces scores. for id, ego and superego. The study used a 


Rotters' Incomplete Sentences Blank. Bass et al. (3954) . _ 


rd 


cross-validation design with nine outcome variables ranging: from 
. P| z & * 


“  - y gupervisor paired-comparison ratings to sustained complaidts. The ‘ 


resuli& were reported in terms of multiple regressions done with all 

- 4 | three test scores. Although Sok outcome variables were found to be 
aseactated with the scores in oné Broup, cross-validation was 
nonsignificant. 

| There have been four police-officer selection studies of the TAV, 
Ctoward-auay Veewun which contains parts involving preferences, 
proverbs and sayings, judgment; an adjective check list and “personal 
data." Each part has approximately 300 items and produces scores si 
three dimensions related te avai Horney's* work: Toward people 
(cooperation), away (withdrawal, creative) and versus (competitiort; 


aggression) (Mormon, Hankey, Heywood & Liddle, 1966), for a total of 15 


sy 


oF 


part/dimension scores. 
: All four “studies aie concurrent VALERY» The first. (Hankey, ° 
Mormon, Kennedy & Heywood, 1965) involved ae parts of the TAV, ail 
except "personal data. " The soteane measures vere supervisor cabins on 
three dimens‘ons, so in all, 36 correlations oe determined. of these, 
six were significant. . ; 

The second study (Mormon, Hankey, Kennedy & Heywood, 1965) used the 
first four parts of the TAY and found that four of the 12 scores 


produced were associated with supervisor ratings of performance, 


The third study (Mormon, oui 4 Heywood & Kennedy, 1965) sada 


the first four parts of the TAV and two outcome measures: anpervinn's 


rankings of pecformance and police hours/hazardous arrest. Only two of 
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the 24 carrelations were associated with rankings, and they were 
x 
apparently significant in the unexpetted direction. 
~- 7 i * j 
The fourth study (Mormon, Hankey, Heywood, Liddle & Goldwhite, _ j 
q 


1967) used ali five .parts of the TAV and two outcome measures< 


cad : 
° 


Supervisor rankings and ehtiiee. Only the three personal data sugves a “as 
were associated with ratings. oe ae 
‘Overall, 13 of 102 correlations were significant in the expected ‘\ 
direction, but only one score was found to be significant in two . 
studies, Adject ive-Check List <- Versus. However, the three personal 
data scores wefe found to have concurrent validity in the only study ia 
which they were tested. Hence, ‘the Adjective Check ‘List-Versus and the- 


three personal data scores may merit pre-selection research.” a: 


(Uns re me em 


MMPI. Few of the studies of personality, and psychopathology measures 


other than the MMPI used pre-selection designs, ahd the pre-selection ° 


studies did not provide any replicated evidence of validity. Several 


a 


CP] scales. and.one TAV scale were found to have validity in more than . , 


one post-selection study,. but the findings should be considered 


* 


suggestive only ‘in view of the Limitations of post-selection research. 


* 


More pre-selection shutter need to be done\before one can safely i 


valid predictor of police performance. «Further research on personality/ ae 
\ *, * y # bak * oe 
psychopathology measures seems warranted in view of the'widespre.d use PE ae 


* 


of the measures in actual police select (Abramson, 1974; Parisher, et fs 


al. 1979) and. che conclusion of Ghigelli (1966; 1973) that persbnality 


* ? * 
‘ ; 
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and 124 occupation scales (Anastasi, 1976;"p. 531). Using resignation : eh 


For females, there were no associations. For neither gex were scores dn 


that five job interests were associated with one Ghee outcone 


peasures : psychologist. tdterests. vith auto: accidents; “physicist th. SBE Bey SS 
1 


tests have been shown to be valid selection variables in a wide array of : 


jobs. . ; ae. = , : . 2 . 
ry ae a * ' 
Interest: Inventories : a a 


A ba o. 
Strong-Campbell. Speilberger, Jolley, Spaulding, Jolley, and Ward . 
(1979) examined the predictive post-selection validity of the <o saa 


Strong-Campbe! Interest Inventory (SCIIy, which has 23 interest scales . Wee 


+ 


within two years without being eligible for rehiring as the outcome os 


variable, the study fouhd that for male officers, high .scores oa four ‘ ig 
scales were associated with lace o6 ne force: businéss : ; . 


management, office practices, Fewite army officer and male“ wai whteaes 


police-officer or highway-patrol | scales associated with xXemaining on the “** 


re + 


force. Becatse 294 correlations were spperently calculated, ‘ene would =. : *, 


‘expect more than four to appear aignéticane merely by ‘chance. Hence, ae ;  : 


the srudy suggests that the scsle would not be useful as 3 selection... coe 
: ia : : 


variable. 4 4 gee - . ' i ae 
Blin (1964, study 4) assessed the predictive post-selection a Ss 
validity of the Strong-Vocational loterest Blank (SVIB), the predecessor a 
‘ ee 


of the S€JI. It had $4 occupational scales and four non-occupational, - 9 Se 


scales (Anastasi, 1968}. The study, using 10 outcome measures, found ‘ 


ia . 
interests with sok accidents; Cid beds interests vith charges of - 


serious misconduct and injuries; and policenaa interests with” days lost ‘ 
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7 rn a 8 ig rok: 
i due to illness. High physicians interests and carpenter interests were 


each associated with bad performance on at least one outcome variable - 


‘ and good performance on another. Thus, nine of spparently 580 ‘3 
a . correYations were significant, Tess then the gunber one would expect by 
’ chance. Also noteworthy 1s that non? of the "significant" scales i 


@ 


-matched a "significant" one in the study of Speilberger et al. (1979). ‘ 
Dubois & Watson (1950) also assessed the concurrent validity of the SVIB 


and found no association with service rating. : ‘ iy 4 


* 
Kudor Preference Record, Sterne (1960) carried out a concurrest ie Oy 8 


7 validity study of the five scores provided by the a Preference 
- : mepere and, found uy MPREREEVION AUR MADE END OE ratings. 20h a, o 
Le a Marsh (1962) found no concurrent association between the five 
: scores and involuntary termination, supervisor ratings or auto 7 ; f 
accidents, Sidhe, Aaehi fiat toned, Marsh (1973), using 95 of Marsh's “= Ae, 


* 4 Be 


+ , ‘ ¢ cl 
subsects 10 years later, assessed the validity of the Mechanical scale 


a oa 


” _ and the Social Services scale of the Kudor and. found high Rename oe 
scores were associated with two measures of promotion and with high 
Supervisor ratings, but not with workers’ compensation claims or 


injuries. Social Service scale scores we're not associated with any of 


: "the five ‘performance variables. ; te 


- Summaty of Interest Inventories. Overall, the studies suggest that “ 

=~" 4snterest inventories are not useful in selecting police officers. This mt 

should come as no surprise, since the inventories were designed not for 
screening but for vocational counseling. Further, studies have shown 


+ 


i that responses+to the items can easily he faked (Kirchner, 1962; 


ee 


Bridgesan & Hollenbeck, 1961); 5 


' 
% 


me, 


Interviews 


Landy (1976) assessed the pre-selection validity of interviews done 


“by a three-person panel. During a 45-minute interview, each applicant 


was rated on nine dimensions and also given a recommendation as to 


hiring. The nine interview ratings were reduced to three factors 


‘ * i 


through factor analysis: manifest motivation, communication and 


personal stability. Supervisor ratings on nine dimensions were factor 


analyzed and then reduced to four outcome factors: professional 
4 


maturity, technical competence, demeanor and communication. The study 


‘found that the hiring recommendation of the interviewers was unrelated 


‘to any of the outcome factors. The study also found, however, that. the 


motivation selection Yactor was associated with the competency and 
demeanor peformance factors and that the personal stability selection 
{factor was associated with the comnunication and competency performance 


factors. 

McDonough & Monahan (1975) assessed the preselection walsdiey of 
one-hour "psychiatric interviews.” The study found that the interview 
ratings failed to discriminate among *groups of fired, promoted and 
non-promoted officers. It is seleowetiy that the interviewer ratings 
had an interrater reliability of only .41, making the variable of little 
value. | 

Dubois & Watson (1950) assessed the pre-selection validity of 
ratings based on an interview done with five police-officer 


interviewers.” Each applicant was rated on five dimensions: appearance, 


manner, speech, adaptability and general impression. These ratings were 
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then summed for an interview score, which was found to be unrelated to 


service ratings. 


Marsh (1962) assessed the pre-selection validity of "civil service" 
interview scores and found no association with involuntary termination, 


supetvisor ratings or rate of auto accidents. 


. 
“ve 
7H 
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J 
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| . Overall, the four pre-selection interview studies provided 


v 


virtually no support for the validity of interviews in the selection of =~ 
police officers. This conclusion is similar to that of ke & 
; er" Oeltjen (1982) and Guion (1976), who Fin’ little or peck that 


supported the valisity of fatervievs in screening for any job. It is 


; ‘ unclear what to make of the welidity of Landy’s ‘interview factors. : 
: Perhaps studies should be done of such factors, rather than of global -. 
ratings. Ordinary Iqterview ratings, however, penile tobe undeserving . 
: . of further research. 
3 Subjective Background Ratings = 


P Cohen & Chaiken (1972) examined, the pre-selection vaiidity of 


ere ee ee 


background ratings made by an officer who reviewed the records of 

applicants and Anterviewed the applicants and sometines Shaye friends, : 
; neighbors, and employers. The study used 14 outcome measures and Sound cS 
: that aaa ratings vere positively ansectate with civil service ; 


sedan tint few totel complaints, few trials, sini substantiated | 
complaints, and few tines sick. There was no ‘association with promotion ; : 
by other than civitservice, awards, “crinisel complaints, tt sick, 
injury claig disapprovals or arrests. i gy 
4 te : 
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McAlister (1970) also veunenad the pre-selection validity of 
background ratings. The study found no evidence of validity with regard 
to awards, supervisor rating or five geared “el atine to sick leave and 
injuries. 

The two studies thus produced mixed results that suggest that 


further exploration of this variable would be appropriate. 


° Discussion , . 
—— The review of police selection validation studiés leads to the 
following recommendations regarding research methods and _Teports:, (a) 
Use only pre~selection designs; (b) when comparing different performance 
groups to test the validity of a variable, make sure that the groups do- 
not differ with regard to a confounding variable; (c) ehietnace if 
possible, outcome variable contamination, and, if that is not feasible, 
at least assess the extent of pha problem and mention it with the 
findings; (d) use only meaningfyl outcome measures, which do ee include 
selectiow for hiring, training PEEFOUEERCE, or long tenure; (e) control 
for different levels of opportunity with outcome measures like number of 
arrests, and if this is not possible, assess the extent |of the 
opportunity variance and report it with the findings; (f) use outcome 
variables that cover as many as possible of the important einetiena of 
police officers, including jancatveet functions such as mediating family 
disputes and writing veneeces (g) use only outcome measures that have 


enough variance to allow a possibility of a significant relationship 
t i 


with the selection variable being tested; (h) when using ratings as 
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Sieeene variables: consider using alternatives to absolute supervisor 
ratings, including peer ratings and various rating formats that may lend 
themselves to more precise ratings, such as paired-comparisons and tes 
rankings ; (i) when using ratings, make sure that the raters have the 
training, information and incentive needed to make valid ratings; (j) 
in reporting results, follow the Guidelines for Reporting 
Criterion-Related and Content Validity of the Office of Federal Contract 
Compliance (Anatasi, 1976) and report 0-order correlations for all 
cou bane cane of selection and outcome variables; (k) recognize and iE 
report. problems of alpha inflation; Q) place little faith in beta 
weights pnless the rules set out by Cattin (1978) are met; and (m) do 
not put great effort into a hunt for racial or gender moderators. 
In paxchint a conclusion as to the validity of the selection 

variables examined, one must establish a standard for what qualifies as 
‘validation. A reasonable standard is that for a selection variable to 


° 
be considered valid, it must be cross-validated in one study or 


* 


validated in two or more studies. as a6 
Only five biodata items ve met thie etandatd: CYoss-validated =, 

predictors of poor-police-officer pavicrmure ace <a) prior involuntary 
termination, (b) having been married more than once, (c) vehicle-code 2 
violations, (d) more serious criminal offenses, and (e) short duration 
of prior jobs. 

‘ Several pre-selection and pietcasdeteian studies of measures of 
intellect provided overall only Slight evidence of validity. Two 
pre-selection studies of subjective background ratings likewise produced 


« 


only mixed evidence of validity. 
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Five pre-selection studies.of the MMPI produced no replicated 
findings of validity, but some MMPI scales were found to have: 
post-selection validity in more than one study: Mf, Pd, Pt and SC, 
Because of inherent deficiencies in ee ee research, the 
findings of post-selection validity should be considered suggestive 
only. Few preaxkectton studies of other personality and 
psychopathology measures have been- reported and these piosticg: at best, 
mixed evidence of validity. However, several scales were found to have 
Sheena eteorias validity in more than one study: The Achievement via 


Conformance, Self-Control, Tolerance and Intellectual Efficiency scales | { 
\ + 


of the CPi and $i Adjective Check List-Versus scale of the experimental 
TAV. , 
Studies of interest inventories and interviews consistently found 
f 
no meaningful evidence of validity. « . _ . , ro 


It would seem that future attempts to find pre-selection predictors 
of police-officer performance would best be directed at biodata, the 
MMPI, to other measures of personality, measures of intellect, and 
subjective background ratings. Interviews and interest inventories 
appear, at this time, not to be srieting variables for further 


validation research regarding police officers. 
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